Skip to content

branch-4.1: [fix](insert) Report physical file count in LoadStatistic.FileNumber #62804#62952

Merged
yiguolei merged 1 commit intobranch-4.1from
auto-pick-62804-branch-4.1
May 1, 2026
Merged

branch-4.1: [fix](insert) Report physical file count in LoadStatistic.FileNumber #62804#62952
yiguolei merged 1 commit intobranch-4.1from
auto-pick-62804-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #62804

…62804)

### What problem does this PR solve?

`InsertIntoTableCommand.applyInsertPlanStatistic` populated
`LoadStatistic.fileNum` from `FileScanNode.getSelectedSplitNum()`, i.e.
the BE **split count**, not the number of physical input files. When a
file crossed the split-size threshold (default
`max_initial_file_split_size × 1.1 ≈ 35.2MB`) and was cut into multiple
splits, both `jobs("type"="insert").LoadStatistic.FileNumber` and
`tasks("type"="insert").LoadStatistic.FileNumber` reported a value
larger than the actual file list. In the user-reported scenario, 8 input
files appeared as `FileNumber = 16` because each 42MB file was split in
two. Data correctness is unaffected; only the displayed statistic was
misleading.

This affects both streaming insert jobs and regular `INSERT INTO ...
SELECT FROM S3/HDFS/Hive`.
@github-actions github-actions Bot requested a review from yiguolei as a code owner April 29, 2026 14:16
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor

run buildall

1 similar comment
@yiguolei
Copy link
Copy Markdown
Contributor

yiguolei commented May 1, 2026

run buildall

@yiguolei yiguolei closed this May 1, 2026
@yiguolei yiguolei reopened this May 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions Bot commented May 1, 2026

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions Bot commented May 1, 2026

PR approved by anyone and no changes requested.

@JNSimba
Copy link
Copy Markdown
Member

JNSimba commented May 1, 2026

run nonConcurrent

@yiguolei yiguolei merged commit 272520e into branch-4.1 May 1, 2026
45 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants